3 research outputs found

    Are We There Yet?: The Development of a Corpus Annotated for Social Acts in Multilingual Online Discourse

    Get PDF
    We present the AAWD and AACD corpora, a collection of discussions drawn from Wikipedia talk pages and small group IRC discussions in English, Russian and Mandarin. Our datasets are annotated with labels capturing two kinds of social acts: alignment moves and authority claims. We describe these social acts, describe our annotation process, highlight challenges we encountered and strategies we employed during annotation, and present some analyses of resulting data set which illustrate the utility of our corpus and identify interactions among social acts and between participant status and social acts and in online discourse

    Markers of contrast in Russian: A corpus-based study

    No full text
    Thesis (Master's)--University of Washington, 2013Markers of contrast in Russian: A corpus-based stud

    AGGREGATION

    No full text
    This archive is associated with the AGGREGATION project, which seeks to automatically generate HPSG grammars on the basis of Interlinnear Glossed Text data. For a detailed description of this project see Chapter 3 of Inferring Grammars from Interlinear Glossed Text: Extracting Typological and Lexical Properties for the Automatic Generation of HPSG Grammars, PhD thesis by Kristen Howell 2020. This archive includes the following: The AGGREGATION/BASIL syntactic inference repository from https://git.ling.washington.edu/agg/aggregation The MOM morphological inference repository from https://git.ling.washington.edu/agg/mom The Xigt framework for eXtensible Interlinear Glossed Text release 1.1 from https://github.com/xigt/xigt The Grammar Matrix Customization system http://matrix.ling.washington.edu/index.html Code, dependencies and sample data for running the AGGREGATION pipeline end to end.The AGGREGATION Project aims to bring the benefits of grammar engineering to language documentation without requiring field linguists to become grammar engineers. We achieve this by automatically creating precision grammars on the basis of analyses and annotations already produced by field linguists together with a typologically-grounded cross-linguistic grammar resource (the LinGO Grammar Matrix) and natural language processing techniques developed for high-resource languages. Precision grammars are machine-readable encodings of mutually-consistent linguistic hypotheses, in our case, concerning morphotactics, morphosyntax and the syntax-semantics interface. They can be used to automatically process text, assigning structures to input strings and strings to input semantic representations. Text processed in this way can then be searched for sentences or word forms with structures of interest or items that are not covered by the grammar (i.e. fall outside current hypotheses).National Science Foundation under Grant No. BCS-1160274 (PI Bender) National Science Foundation under Grant No. BCS-1561833 (PI Bender
    corecore